outlier distribution
- Health & Medicine (0.67)
- Information Technology (0.46)
Appendix A On the Assumptions and Efficacy of the White Noise Test
In this section we provide visualizations to better understand the statistical power of our test, and to verify the claims in Section 2.3. We can see that R constructed from outlier images generally include a higher proportion of unexplained semantic information: comparing the CelebA residual in Fig.3(a) (second column) where the model is trained on CIFAR-10, to Fig.3(b) (first column) where CelebA is inlier, we can see that the facial structure in CelebA residual is more evident when the model is trained on CIFAR-10. Similarly, comparing the CIFAR-10 residual from both models, we can see that the structure of the vehicle (e.g. As the residual sequences constructed from outliers tend to have more natural image-like structures, they will also have stronger spatial autocorrelations, compared with residuals from inlier samples that should in principle be white noise. Note that while the residual sequences constructed from inliers also contain unexplained semantic information, this is due to estimation error of the deep AR model, and should not happen should we have access to the ground truth model, as we have shown in Section 2.2.
Further Analysis of Outlier Detection with Deep Generative Models Ziyu Wang 1,2
The recent, counter-intuitive discovery that deep generative models (DGMs) can frequently assign a higher likelihood to outliers has implications for both outlier detection applications as well as our overall understanding of generative modeling. In this work, we present a possible explanation for this phenomenon, starting from the observation that a model's typical set and high-density region may not conincide. From this vantage point we propose a novel outlier test, the empirical success of which suggests that the failure of existing likelihood-based outlier tests does not necessarily imply that the corresponding generative model is uncalibrated. We also conduct additional experiments to help disentangle the impact of low-level texture versus high-level semantics in differentiating outliers. In aggregate, these results suggest that modifications to the standard evaluation practices and benchmarks commonly applied in the literature are needed.
- Asia > China > Beijing > Beijing (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
Robust Density Estimation under Besov IPM Losses
We study minimax convergence rates of nonparametric density estimation under the Huber contamination model, in which a ``contaminated'' proportion of the data comes from an unknown outlier distribution. We provide the first results for this problem under a large family of losses, called Besov integral probability metrics (IPMs), that include L^p, Wasserstein, Kolmogorov-Smirnov, Cramer-von Mises, and other commonly used metrics. Under a range of smoothness assumptions on the population and outlier distributions, we show that a re-scaled thresholding wavelet estimator converges at the minimax optimal rate under a wide variety of losses and also exhibits optimal dependence on the contamination proportion. We also provide a purely data-dependent extension of the estimator that adapts to both an unknown contamination proportion and the unknown smoothness of the true density. Finally, based on connections shown recently between density estimation under IPM losses and generative adversarial networks (GANs), we show that certain GAN architectures are robustly minimax optimal.
- Health & Medicine (0.67)
- Information Technology (0.46)
Accelerating LLM Inference with Flexible N:M Sparsity via A Fully Digital Compute-in-Memory Accelerator
Ramachandran, Akshat, Kundu, Souvik, Raha, Arnab, Kundu, Shamik, Mathaikutty, Deepak K., Krishna, Tushar
--Large language model (LLM) pruning with fixed N:M structured sparsity significantly limits the expressivity of the sparse model, yielding sub-optimal performance. On the contrary, support for more than one N:M pattern to provide sparse representational freedom yields a costly overhead in the hardware. T o mitigate these challenges for LLMs, we first present a f lexible l ayer-wise o utlier-density-a ware N:M sparsity (FLOW) selection method. FLOW enables the identification of optimal layer-wise N and M values (from a given range) by simultaneously accounting for the presence and distribution of outliers, allowing a higher degree of representational freedom. T o deploy the sparse models with such N:M flexibility, we then present a flex ible low overhead, digital c ompute-i n-m emory architecture (FlexCiM). FlexCiM enables support for diverse sparsity patterns by partitioning a digital CiM (DCiM) macro into smaller sub-macros which are adaptively aggregated and disaggregated through distribution and merging mechanisms for different values of N and M. Extensive experiments on both transformer-based and recurrence-based state space foundation models (SSMs) demonstrate FLOW to outperform existing alternatives with an accuracy improvement of up to 36%, while FlexCiM delivers up to 1.75 lower inference latency and 1.5 lower energy consumption compared to existing sparse accelerators. To reduce the colossal size of large language models (LLMs) and enable their efficient deployment on resource-constrained devices, post-training pruning has emerged as an effective model compression method [9], [33], [37]. It reduces the memory footprint of the pre-trained LLMs by removing ineffectual model parameters, at the granularity of individual weights ( unstructured) or blocks of weights ( structured), and storing sparse tensors in a compressed format (CSR/CSC) [14]. Notably, model pruning may yield compute acceleration via skipping ineffectual computations associated with the zero-valued weight/activation. However, traditional weight pruning often requires fine-tuning, which becomes exceedingly compute-heavy for LLMs. Furthermore, this often requires the model to yield structured pruned weights, which can cause a high accuracy drop compared to the models pruned via an unstructured approach.
- Semiconductors & Electronics (0.46)
- Energy (0.34)
SANDRO: a Robust Solver with a Splitting Strategy for Point Cloud Registration
Adlerstein, Michael, Soares, João Carlos Virgolino, Bratta, Angelo, Semini, Claudio
Point cloud registration is a critical problem in computer vision and robotics, especially in the field of navigation. Current methods often fail when faced with high outlier rates or take a long time to converge to a suitable solution. In this work, we introduce a novel algorithm for point cloud registration called SANDRO (Splitting strategy for point cloud Alignment using Non-convex anD Robust Optimization), which combines an Iteratively Reweighted Least Squares (IRLS) framework with a robust loss function with graduated non-convexity. This approach is further enhanced by a splitting strategy designed to handle high outlier rates and skewed distributions of outliers. SANDRO is capable of addressing important limitations of existing methods, as in challenging scenarios where the presence of high outlier rates and point cloud symmetries significantly hinder convergence. SANDRO achieves superior performance in terms of success rate when compared to the state-of-the-art methods, demonstrating a 20% improvement from the current state of the art when tested on the Redwood real dataset and 60% improvement when tested on synthetic data.
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > Italy (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
Robust Density Estimation under Besov IPM Losses
We study minimax convergence rates of nonparametric density estimation under the Huber contamination model, in which a contaminated'' proportion of the data comes from an unknown outlier distribution. We provide the first results for this problem under a large family of losses, called Besov integral probability metrics (IPMs), that include L p, Wasserstein, Kolmogorov-Smirnov, Cramer-von Mises, and other commonly used metrics. Under a range of smoothness assumptions on the population and outlier distributions, we show that a re-scaled thresholding wavelet estimator converges at the minimax optimal rate under a wide variety of losses and also exhibits optimal dependence on the contamination proportion. We also provide a purely data-dependent extension of the estimator that adapts to both an unknown contamination proportion and the unknown smoothness of the true density. Finally, based on connections shown recently between density estimation under IPM losses and generative adversarial networks (GANs), we show that certain GAN architectures are robustly minimax optimal.
OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning
Li, Pengxiang, Yin, Lu, Gao, Xiaowei, Liu, Shiwei
The rapid advancements in Large Language Models (LLMs) have revolutionized various natural language processing tasks. However, the substantial size of LLMs presents significant challenges in training or fine-tuning. While parameter-efficient approaches such as low-rank adaptation (LoRA) have gained popularity, they often compromise performance compared to full-rank fine-tuning. In this paper, we propose Outlier-weighed Layerwise Sampled Low-Rank Projection (OwLore), a new memory-efficient fine-tuning approach, inspired by the layerwise outlier distribution of LLMs, which dynamically samples pre-trained layers to fine-tune instead of adding additional adaptors. We first interpret the outlier phenomenon through the lens of Heavy-Tailed Self-Regularization theory (HT-SR), discovering that layers with more outliers tend to be more heavy-tailed and consequently better trained. Inspired by this finding, OwLore strategically assigns higher sampling probabilities to layers with more outliers to better leverage the knowledge stored in pre-trained LLMs. To further mitigate the memory demands of fine-tuning, we integrate gradient low-rank projection into our approach, which facilitates each layer to be efficiently trained in a low-rank manner. By incorporating the efficient characteristics of low-rank and optimal layerwise sampling, OwLore significantly improves the memory-performance trade-off in LLM pruning. Our extensive experiments across various architectures, including LLaMa2, LLaMa3, and Mistral, demonstrate that OwLore consistently outperforms baseline approaches, including full fine-tuning. Specifically, it achieves up to a 1.1% average accuracy gain on the Commonsense Reasoning benchmark, a 3.0% improvement on MMLU, and a notable 10% boost on MT-Bench, while being more memory efficient. OwLore allows us to fine-tune LLaMa2-7B with only 21GB of memory.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- Asia > China > Liaoning Province > Dalian (0.04)
Resilient VAE: Unsupervised Anomaly Detection at the SLAC Linac Coherent Light Source
Humble, Ryan, Colocho, William, O'Shea, Finn, Ratner, Daniel, Darve, Eric
Significant advances in utilizing deep learning for anomaly detection have been made in recent years. However, these methods largely assume the existence of a normal training set (i.e., uncontaminated by anomalies) or even a completely labeled training set. In many complex engineering systems, such as particle accelerators, labels are sparse and expensive; in order to perform anomaly detection in these cases, we must drop these assumptions and utilize a completely unsupervised method. This paper introduces the Resilient Variational Autoencoder (ResVAE), a deep generative model specifically designed for anomaly detection. ResVAE exhibits resilience to anomalies present in the training data and provides feature-level anomaly attribution. During the training process, ResVAE learns the anomaly probability for each sample as well as each individual feature, utilizing these probabilities to effectively disregard anomalous examples in the training data. We apply our proposed method to detect anomalies in the accelerator status at the SLAC Linac Coherent Light Source (LCLS). By utilizing shot-to-shot data from the beam position monitoring system, we demonstrate the exceptional capability of ResVAE in identifying various types of anomalies that are visible in the accelerator.
- Oceania > Australia > Victoria > Melbourne (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Tennessee > Knox County > Knoxville (0.04)
- (10 more...)